41 research outputs found

    Built to Last or Built Too Fast? Evaluating Prediction Models for Build Times

    Full text link
    Automated builds are integral to the Continuous Integration (CI) software development practice. In CI, developers are encouraged to integrate early and often. However, long build times can be an issue when integrations are frequent. This research focuses on finding a balance between integrating often and keeping developers productive. We propose and analyze models that can predict the build time of a job. Such models can help developers to better manage their time and tasks. Also, project managers can explore different factors to determine the best setup for a build job that will keep the build wait time to an acceptable level. Software organizations transitioning to CI practices can use the predictive models to anticipate build times before CI is implemented. The research community can modify our predictive models to further understand the factors and relationships affecting build times.Comment: 4 paged version published in the Proceedings of the IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) Pages 487-490. MSR 201

    Supporting Development Decisions with Software Analytics

    Get PDF
    Software practitioners make technical and business decisions based on the understanding they have of their software systems. This understanding is grounded in their own experiences, but can be augmented by studying various kinds of development artifacts, including source code, bug reports, version control meta-data, test cases, usage logs, etc. Unfortunately, the information contained in these artifacts is typically not organized in the way that is immediately useful to developers’ everyday decision making needs. To handle the large volumes of data, many practitioners and researchers have turned to analytics — that is, the use of analysis, data, and systematic reasoning for making decisions. The thesis of this dissertation is that by employing software analytics to various development tasks and activities, we can provide software practitioners better insights into their processes, systems, products, and users, to help them make more informed data-driven decisions. While quantitative analytics can help project managers understand the big picture of their systems, plan for its future, and monitor trends, qualitative analytics can enable developers to perform their daily tasks and activities more quickly by helping them better manage high volumes of information. To support this thesis, we provide three different examples of employing software analytics. First, we show how analysis of real-world usage data can be used to assess user dynamic behaviour and adoption trends of a software system by revealing valuable information on how software systems are used in practice. Second, we have created a lifecycle model that synthesizes knowledge from software development artifacts, such as reported issues, source code, discussions, community contributions, etc. Lifecycle models capture the dynamic nature of how various development artifacts change over time in an annotated graphical form that can be easily understood and communicated. We demonstrate how lifecycle models can be generated and present industrial case studies where we apply these models to assess the code review process of three different projects. Third, we present a developer-centric approach to issue tracking that aims to reduce information overload and improve developers’ situational awareness. Our approach is motivated by a grounded theory study of developer interviews, which suggests that customized views of a project’s repositories that are tailored to developer-specific tasks can help developers better track their progress and understand the surrounding technical context of their working environments. We have created a model of the kinds of information elements that developers feel are essential in completing their daily tasks, and from this model we have developed a prototype tool organized around developer-specific customized dashboards. The results of these three studies show that software analytics can inform evidence-based decisions related to user adoption of a software project, code review processes, and improved developers’ awareness on their daily tasks and activities

    Attaching Social Interactions Surrounding Software Changes to the Release History of an Evolving Software System

    Get PDF
    Open source software is designed, developed and maintained by means of electronic media. These media include discussions on a variety of issues reflecting the evolution of a software system, such as reports on bugs and their fixes, new feature requests, design change, refactoring tasks, test plans, etc. Often this valuable information is simply buried as plain text in the mailing archives. We believe that email interactions collected prior to a product release are related to its source code modifications, or if they do not immediately correlate to change events of the current release, they might affect changes happening in future revisions. In this work, we propose a method to reason about the nature of software changes by mining and correlating electronic mailing list archives. Our approach is based on the assumption that developers use meaningful names and their domain knowledge in defining source code identifiers, such as classes and methods. We employ natural language processing techniques to find similarity between source code change history and history of public interactions surrounding these changes. Exact string matching is applied to find a set of common concepts between discussion vocabulary and changed code vocabulary. We apply our correlation method on two software systems, LSEdit and Apache Ant. The results of these exploratory case studies demonstrate the evidence of similarity between the content of free-form text emails among developers and the actual modifications in the code. We identify a set of correlation patterns between discussion and changed code vocabularies and discover that some releases referred to as minor should instead fall under the major category. These patterns can be used to give estimations about the type of a change and time needed to implement it

    Mining modern repositories with elasticsearch

    Full text link
    Organizations are generating, processing, and retaining data at a rate that often exceeds their ability to analyze it effec-tively; at the same time, the insights derived from these large data sets are often key to the success of the organi-zations, allowing them to better understand how to solve hard problems and thus gain competitive advantage. Be-cause this data is so fast-moving and voluminous, it is in-creasingly impractical to analyze using traditional offline, read-only relational databases. Recently, new “big data ” technologies and architectures, including Hadoop and NoSQL databases, have evolved to better support the needs of organizations analyzing such data. In particular, Elasticsearch — a distributed full-text search engine — explicitly addresses issues of scalability, big data search, and performance that relational databases were simply never designed to support. In this paper, we reflect upon our own experience with Elasticsearch and highlight its strengths and weaknesses for performing modern mining software repositories research

    High frequency of BRCA1, but not CHEK2 or NBS1 (NBN), founder mutations in Russian ovarian cancer patients

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A significant portion of ovarian cancer (OC) cases is caused by germ-line mutations in BRCA1 or BRCA2 genes. BRCA testing is cheap in populations with founder effect and therefore recommended for all patients with OC diagnosis. Recurrent mutations constitute the vast majority of BRCA defects in Russia, however their impact in OC morbidity has not been yet systematically studied. Furthermore, Russian population is characterized by a relatively high frequency of CHEK2 and NBS1 (NBN) heterozygotes, but it remains unclear whether these two genes contribute to the OC risk.</p> <p>Methods</p> <p>The study included 354 OC patients from 2 distinct, geographically remote regions (290 from North-Western Russia (St.-Petersburg) and 64 from the south of the country (Krasnodar)). DNA samples were tested by allele-specific PCR for the presence of 8 founder mutations (BRCA1 5382insC, BRCA1 4153delA, BRCA1 185delAG, BRCA1 300T>G, BRCA2 6174delT, CHEK2 1100delC, CHEK2 IVS2+1G>A, NBS1 657del5). In addition, literature data on the occurrence of BRCA1, BRCA2, CHEK2 and NBS1 mutations in non-selected ovarian cancer patients were reviewed.</p> <p>Results</p> <p>BRCA1 5382insC allele was detected in 28/290 (9.7%) OC cases from the North-West and 11/64 (17.2%) OC patients from the South of Russia. In addition, 4 BRCA1 185delAG, 2 BRCA1 4153delA, 1 BRCA2 6174delT, 2 CHEK2 1100delC and 1 NBS1 657del5 mutation were detected. 1 patient from Krasnodar was heterozygous for both BRCA1 5382insC and NBS1 657del5 variants.</p> <p>Conclusion</p> <p>Founder BRCA1 mutations, especially BRCA1 5382insC variant, are responsible for substantial share of OC morbidity in Russia, therefore DNA testing has to be considered for every OC patient of Russian origin. Taken together with literature data, this study does not support the contribution of CHEK2 in OC risk, while the role of NBS1 heterozygosity may require further clarification.</p

    25th annual computational neuroscience meeting: CNS-2016

    Get PDF
    The same neuron may play different functional roles in the neural circuits to which it belongs. For example, neurons in the Tritonia pedal ganglia may participate in variable phases of the swim motor rhythms [1]. While such neuronal functional variability is likely to play a major role the delivery of the functionality of neural systems, it is difficult to study it in most nervous systems. We work on the pyloric rhythm network of the crustacean stomatogastric ganglion (STG) [2]. Typically network models of the STG treat neurons of the same functional type as a single model neuron (e.g. PD neurons), assuming the same conductance parameters for these neurons and implying their synchronous firing [3, 4]. However, simultaneous recording of PD neurons shows differences between the timings of spikes of these neurons. This may indicate functional variability of these neurons. Here we modelled separately the two PD neurons of the STG in a multi-neuron model of the pyloric network. Our neuron models comply with known correlations between conductance parameters of ionic currents. Our results reproduce the experimental finding of increasing spike time distance between spikes originating from the two model PD neurons during their synchronised burst phase. The PD neuron with the larger calcium conductance generates its spikes before the other PD neuron. Larger potassium conductance values in the follower neuron imply longer delays between spikes, see Fig. 17.Neuromodulators change the conductance parameters of neurons and maintain the ratios of these parameters [5]. Our results show that such changes may shift the individual contribution of two PD neurons to the PD-phase of the pyloric rhythm altering their functionality within this rhythm. Our work paves the way towards an accessible experimental and computational framework for the analysis of the mechanisms and impact of functional variability of neurons within the neural circuits to which they belong

    msr14

    No full text
    The MSR 2014 challenge dataset is a (very) trimmed down version of the original GHTorrent dataset. It includes data from the top-10 starred software projects for the top programming languages on Github, which gives 90 projects and their forks. For each project, we retrieved all data including issues, pull requests organizations, followers, stars and labels (milestones and events not included). The dataset was constructed from scratch to ensure the latest information is in it. More information at http://openscience.us/repo/msr/msr14.html

    Message from the MSR 2023 Junior PC Co-Chairs

    No full text

    Correlating Social Interactions to Release History During Software Evolution

    No full text
    In this paper, we propose a method to reason about the nature of software changes by mining and correlating discussion archives. We employ an information retrieval approach to find correlation between source code change history and history of social interactions surrounding these changes. We apply our correlation method on two software systems, LSEdit and Apache Ant. The results of these exploratory case studies demonstrate the evidence of similarity between the content of free-form text emails among developers and the actual modifications in the code. We identify a set of correlation patterns between discussion and changed code vocabularies and discover that some releases referred to as minor should instead fall under the major category. These patterns can be used to give estimations about the type of a change and time needed to implement it

    Investigating the android apps' success: An empirical study

    No full text
    Measuring the success of software systems was not a trivial task in the past. Nowadays, mobile apps provide a uniform schema, i.e., the average ratings provided by the apps' users to gauge their success. While recent research has focused on examining the relationship between change-and fault-proneness and apps' lack of success, as well as qualitatively analyzing the reasons behind the apps' users dissatisfaction, there is little empirical evidence on the factors related to the success of mobile apps. In this paper, we explore the relationships between the mobile apps' success and a set of metrics that not only characterize the apps themselves but also the quality of the APIs used by the apps, as well as user attributes when they interact with the apps. In particular, we measure API quality in terms of bugs fixed in APIs used by apps and changes that occurred in the API methods. We examine different kinds of changes including changes in the interfaces, implementation, and exception handling. For user-related factors, we leverage the number of app's downloads and installations, and users' reviews. Through an empirical study of 474 free Android apps, we find that factors such as the number of users' reviews provided for an app, app's category and size appear to have an impact on the app's success
    corecore